Pattern iscovery i istributed at abases
نویسندگان
چکیده
Most algorithms for learning and pattern discovery in data assume that all the needed data is available on one computer at a single site. This assumption does not hold in situations where a number of independent databases reside on geographically distributed nodes of a computer network. These databases cannot be moved to a single site due to size, security, privacy and data-ownership concerns but all of them together constitute the dataset in which patterns must be discovered. Some pattern discovery algorithms can be adapted to such situations &nd some others become inefficient or inapplicable. In this paper we show how a decision-tree induction algorithm may be adapted for distributed data situations. We also discuss some general issues relating to the adaptability of other pattern discovery algorithms to distributed data situations
منابع مشابه
Saving whilst Gambling: An Empirical Analysis of U.K. Premium Bonds
Working papers are in draft form. This w orking paper is d istributed for purposes of com m ent and d iscussion only. I t m ay not be reprod uced w ithout perm ission of the copyright hold er. Copies of w orking papers are available from the author.
متن کاملG EOGRAPHICALLY D ISTRIBUTED C OMPUTING : ATM over the NASA ACTS Satellite
This paper outlines some of the problems and the solutionsdeveloped to support geographically distributed computing via ATM. In particular, applications developed with the Parallel Virtual Machine (PVM) [1] message passing library, communicating via ATM at OC3c speeds (155 Mbps) through the NASA ACTS satellite are considered. A primary goal of this work is to assess the suitability of an ATM-ba...
متن کاملC Ollaborative D Efence for D Istributed a Ttacks ( C Ase S Tudy of P Alestinian I Nformation S Ystems )
In this paper, we develop a comprehensive approach for protecting national Palestinian information systems. We do not restrict our attention to protecting each individual organization, but rather focus on the entire ecosystem as a whole. Therefore, the developed system will be opened for participation for all Palestinian governmental and non-governmental organizations who are interested in impr...
متن کاملComplexity of Protein–Protein Interaction Networks, Complexes, and Pathways
The focus of proteomic re s e a rch in developing ex p e rimental techniques for pro t e i n i d e n t i fi c ation and interaction studies is shifting from individual proteins to their organ i z ation in reaction pat h way s , c o m p l exe s , and netwo rk s , i . e. , to the pro t e o m e — t h e l a rge-scale netwo rk comprising all pro t e i n – p rotein interactions in a cell, t i s s u e...
متن کاملConcurrency: A Case Study in Remote Tasking and D istributed I
Remote tusking encompasses different functionality, such as remote forking, multiple remote spawning, and task migration. In order to overcome the relatively high costs of these mechunisms, optimizations can be upplied at various levels of the underlying operating system or application. Optimizations include concurrent message transmission, increased throughput and reduced latency at the distri...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999